lm())lm())modelsummary())summary() and qt())Install/load the tidyverse, gapminder, and
ggthemes packages.
library(pacman)
p_load(tidyverse, gapminder, ggthemes)
tidyverse: gives us access to data manipulation
functions, as well as the ggplot2 packagegapminder: data sourceggthemes: provides us with themes for ggplotsRecall the gapminder dataset:
head(gapminder, n = 15)
## # A tibble: 15 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## 11 Afghanistan Asia 2002 42.1 25268405 727.
## 12 Afghanistan Asia 2007 43.8 31889923 975.
## 13 Albania Europe 1952 55.2 1282697 1601.
## 14 Albania Europe 1957 59.3 1476505 1942.
## 15 Albania Europe 1962 64.8 1728137 2313.
ggplot2What makes ggplot2 special is that it is based on
the Grammar of Graphics, which allows us to create
graphs by combining independent components. This makes
ggplot2 exceptionally flexible, and allows us to learn how
to generate graphs by mastering a set of core principles rather than
memorizing special approaches to each type of graph.
ggplot2 is designed to work iteratively. You start
with a layer that shows the raw data. Then you add layers of annotations
and statistical summaries.
The grammar of graphics describes the fundamental features that
underlie all statistical graphics – it is an answer to the question
“What is a statistical graphic?” ggplot2 builds on the
grammar of graphics by focusing on layers and adapting it for use in R.
In brief, the grammar tells us that a graphic maps the data to the
aesthetic attributes (color, shape, size) of geometric objects (points,
lines, bars). The plot may also include statistical transformations of
the data and information about the plot’s coordinate system. The
combination of these independent components are what make up a
graphic.
All plots are composed of the data (the information you want to visualize) and a mapping (the description of how the data’s variables are mapped to aesthetic attributes). There are five mapping components:
ggplot2 WalkthroughEvery ggplot2 plot has three key components:
Example:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
In the above plot:
gapminder;gdpPercap) mapped to \(x\)-axis, life expectancy
(lifeExp) mapped to \(y\)-axis;Notice that the data and aesthetic mappings are supplied in
ggplot(). Then layers are added on with +.
Without adding the layer, we would have the following:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp))
The proper way to construct plots using ggplot2 is by
adding components iteratively!
Also, note that each new command (separated by +) is on
a new line – I recommend sticking to this convention to make your code
more easily readable.
Let’s tweak our data slightly and use a different geom
(geom_line()) to represent our observations as a line
plot:
ggplot(gapminder |> group_by(year) |> summarize(gdpPercap = mean(gdpPercap)),
aes(x = year, y = gdpPercap)) +
geom_line()
Now let’s use the geom_histogram() geom to plot the
histogram of the lifeExp variable sample in the raw
dataset:
ggplot(gapminder, aes(lifeExp)) +
geom_histogram()
Similarly, we may use the geom_density() geom to create
a density plot of the lifeExp variable given our
sample:
ggplot(gapminder, aes(lifeExp)) +
geom_density()
We may also specify aesthetic attributes such as color
(color), size (size), and shape
(shape) inside of aes().
Let’s specify the color aesthetic in our graph by
including color = variable inside of aes(). In
this case, let’s color our observations by continent:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point()
This gives each point a unique color corresponding to its associated continent. The legend allows us to read data values from the color.
Similarly, we may express the continent category of each observation
by specifying the shape aesthetic (although this isn’t as
helpful).
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, shape = continent)) +
geom_point()
We may also specify the size aesthetic in our graph by
including size = variable inside of aes(). In
this case, let’s associated point size with population:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop/1000000)) +
geom_point()
We can specify multiple aesthetics at the same time. For example, we
may do color = continent and
size = pop/1000000 at the same time:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop/1000000)) +
geom_point()
ggplot2 takes care of the details of converting data
(e.g., ‘Africa’, ‘Asia’, ‘Europe’) into aesthetics (e.g., ‘red’,
‘yellow’, ‘green’) with a scale. There is one scale for
each aesthetic mapping in a plot. The scale is also responsible for
creating a guide, an axis or legend, that allows you to read the plot,
converting aesthetic values back into data values. We stick with the
default scales provided by ggplot2, but it is possible to
override them.
If you want to set an aesthetic to a fixed value, without scaling it,
do so in the individual layer outside of aes():
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(color = "red")
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(shape = 3)
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(size = 5)
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(color = "red", shape = 2, size = 5)
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 3, alpha = 1/2)
Another technique for displaying additional categorical variables on a plot is faceting. Faceting creates tables of graphics by splitting the data into subsets and displaying the same graph for each subset.
There are two types of faceting: grid and wrapped. Wrapped is the most useful, so we’ll discuss it here. To facet a plot you simply add a faceting specification with facet_wrap(), which takes the name of a variable preceded by ~:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
facet_wrap(~continent)
We may also modify plot labels using ggplot:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 3, alpha = 1/2) +
labs(x = "GDP per Capita",
y = "Life Expectancy (Years)",
color = "Continent",
title = "GDP per Capita vs. Life Expectancy")
Lastly, the ggthemes package gives us access to a
variety of themes. Let’s try out a few of them. But first, we save our
plot as an object:
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 3, alpha = 1/2) +
labs(x = "GDP per Capita",
y = "Life Expectancy (Years)",
color = "Continent",
title = "GDP per Capita vs. Life Expectancy")
Now, when we print p, we print our plot:
p
Let’s try out the Wall St. Journal theme:
p + theme_wsj()
The Tufte theme is a classic:
p + theme_tufte()
The Economist theme:
p + theme_economist()
An alternative The Economist theme:
p + theme_economist_white()
The old Excel theme:
p + theme_excel()
The new Excel theme:
p + theme_excel_new()
The Google Docs theme:
p + theme_gdocs()
Theme Calc (no clue):
p + theme_calc()
A favorite of mine – the minimalistic theme:
p + theme_minimal()